Multivariate TS Models

Multivariate Analysis

Based on the data and previous univariate analysis, we will begin conducting multivariate analysis in this section. Our data science questions include:

  • What factors interact with the US Dollar Index?
  • What exogenous variables affect the US Dollar Index?
  • What factors are influenced by the US Dollar Index?

Before constructing Multivariate Time Series Models, we will first conduct a literature review (see the Introduction part) and analyze the mechanisms among variables from an economic perspective.

Interacting Variables

Trade Balance

  • Trade Deficit → Dollar Depreciation: Higher imports than exports lead to a trade deficit, requiring capital inflows but increasing dollar supply, causing depreciation.

  • Dollar Depreciation → Trade Deficit Reduction: A weaker dollar boosts US exports and reduces imports, helping balance the deficit.

  • The expansion of the trade deficit leads to the depreciation of the US dollar.

    A trade deficit means that the US imports more than it exports. According to the Balance of Payments Identity (Current Account + Financial Account = 0), a trade deficit (i.e., a negative current account) means that the financial account must be positive. This is reflected in capital inflows. Typically, the US issues government bonds and attracts foreign investment to balance the trade deficit. However, the increased supply of dollars in the foreign exchange market will cause the dollar to depreciate, or foreign investors may become concerned about the US fiscal situation, leading to reduced demand for the dollar and further depreciation.

  • The depreciation of the dollar leads to a reduction in the trade deficit.

    When the dollar depreciates, US goods and services become relatively cheaper in international markets, which may increase US exports. At the same time, imported goods become relatively expensive, leading to a reduction in US imports. Therefore, the depreciation of the dollar generally reduces the trade deficit.

Global Commodity Market

  • A stronger dollar leads to a decrease in global commodity prices.

    Since the dollar is the global reserve and trading currency, commodities (such as oil, gold, copper, aluminum, etc.) are typically priced in dollars. When the dollar appreciates, the cost for holders of other currencies to buy dollar-priced commodities increases, which leads to a reduction in demand and lowers commodity prices. Additionally, investors may shift funds from physical assets like commodities into dollar-denominated assets, further driving down commodity prices.

  • Changes in commodity prices also influence the US Dollar Index.

    A rise in commodity prices, especially oil and metal prices, often puts pressure on global inflation. This, in turn, affects the Dollar Index. Moreover, commodity price fluctuations often reflect changes in the global economy. For example, an increase in oil prices may indicate a slowdown in global economic growth or supply-demand imbalances. These factors could lead to higher demand for the dollar as a safe-haven currency, thus pushing up the Dollar Index.

Stock Market

  • The risk and return associated with stocks are relatively high, and the dollar can serve as a safe-haven asset. Therefore, there is a competitive relationship between the two. Generally speaking, the stock market and the Dollar Index have an inverse relationship. When global investors have a higher risk appetite or when the stock market is strong, capital tends to flow into the stock market. This reduces demand for the dollar and leads to a depreciation of the dollar. On the other hand, when the dollar appreciates, investors may prefer to move funds into low-risk assets like bonds. This leads to capital outflows from the stock market and puts downward pressure on it, especially on overvalued stocks. However, a stronger dollar may also reflect a robust economy, which could push stock prices higher.

Gold Market

  • The dollar and gold also tend to show inverse fluctuations due to their competitive relationship.

  • A stronger dollar leads to a decrease in gold prices.

    First, gold prices are usually denominated in dollars. When the dollar appreciates, the cost of gold in other currencies increases, reducing demand and lowering gold prices. Secondly, a stronger dollar may be due to the Federal Reserve raising interest rates. In this case, investors are more likely to invest in higher-yielding assets (such as dollar-denominated bonds) and reduce holdings in non-interest-bearing assets like gold, leading to a decline in gold prices.

  • An increase in gold prices leads to the depreciation of the dollar.

    When the dollar weakens or global economic uncertainty rises, the dollar may no longer be considered a safe-haven asset, and gold becomes the preferred choice. Investors may tend to buy gold as a hedge against inflation or currency devaluation, leading to decreased demand for the dollar and causing it to depreciate.

GDP

  • Economic growth drives dollar appreciation.

    When a country’s GDP grows strongly, it typically reflects good economic performance and attracts foreign investors’ attention. This capital inflow (e.g., foreign direct investment, securities investment) increases demand for the country’s currency, which in our case is the US dollar, leading to dollar appreciation.

Unemployment Rate

  • A higher unemployment rate leads to dollar depreciation.

    • Mechanism 1

      The unemployment rate mainly affects the economy, which in turn influences the Dollar Index. There is an inverse relationship between the unemployment rate and GDP growth. A higher unemployment rate typically indicates an economic recession or depression, where businesses reduce hiring, which is often accompanied by a decrease in consumption and investment. This reduces the ability to attract foreign investment into the US, leading to reduced demand for dollar-denominated assets and a depreciation of the dollar.

    • Mechanism 2

      When the unemployment rate is high, the Federal Reserve may adopt loose monetary policies (e.g., lowering interest rates or quantitative easing) to stimulate the economy and reduce unemployment. Such loose monetary policies typically lead to dollar depreciation because lower interest rates reduce the relative attractiveness of the dollar, making it less likely to attract foreign capital inflows, thereby decreasing demand for the dollar.

CPI

  • An increase in the CPI drives dollar appreciation.

    When the CPI rises, it indicates inflation. In this case, the Federal Reserve may adopt a tightening monetary policy, such as raising interest rates, to control inflation. Higher interest rates make the dollar more attractive because they can attract foreign capital inflows, increasing demand for the dollar and thus driving up its value.

    Expectations are also a factor. If inflation expectations rise, markets may anticipate that the Federal Reserve will take more tightening measures, which could lead to dollar appreciation.

Exogenous Variables

Real Estate Market

  • A stronger dollar typically indicates a relatively strong US economy, and interest rates may rise. Higher mortgage rates increase borrowing costs, which can suppress demand for housing and lead to a decline in housing prices.

  • However, a stronger dollar can also boost confidence among homebuyers and global investors. This leads to increased capital inflows into US real estate, pushing housing prices up.

Therefore, the impact may be a combined effect.

Tourism

  • When the dollar appreciates, foreign tourists need to exchange more of their local currency for dollars, which increases their travel costs. Some tourists may opt for other travel destinations, leading to a decrease in the number of tourists visiting the US. Therefore, US tourism may decline with a stronger dollar.
Code
library(tidyverse)
library(ggplot2)
library(forecast)
library(astsa) 
library(xts)
library(tseries)
library(fpp2)
library(fma)
library(lubridate)
library(tidyverse)
library(TSstudio)
library(quantmod)
library(tidyquant)
library(plotly)
library(readr)
library(dplyr)
library(kableExtra)
library(knitr)
library(patchwork)
library(vars)

# Load data
invisible(getSymbols("DX-Y.NYB", src = "yahoo", from = "2005-01-01", to = "2024-12-31"))
dxy <- data.frame(Date = index(`DX-Y.NYB`), 
                       Open = `DX-Y.NYB`[, "DX-Y.NYB.Open"], 
                       High = `DX-Y.NYB`[, "DX-Y.NYB.High"], 
                       Low = `DX-Y.NYB`[, "DX-Y.NYB.Low"], 
                       Close = `DX-Y.NYB`[, "DX-Y.NYB.Close"])
colnames(dxy) <- c("Date", "Open", "High", "Low", "Close")
dxy <- na.omit(dxy)

bea <- read.csv("./data/bea.csv")
bea$time <- as.Date(bea$time)

gdp <- read.csv("./data/gdp.csv")
gdp$time <- as.Date(gdp$time)
gdp$total <- gdp$consumption + gdp$investment + gdp$net_export + gdp$government

data_unem <- read.csv("./data/unem.csv", header=TRUE)
data_unem$time <- as.Date(data_unem$time)

data_cpi <- read.csv("./data/cpi.csv", header=TRUE)
data_cpi$time <- as.Date(data_cpi$time)

invisible(getSymbols("^GSPC", src = "yahoo", from = "2005-01-01", to = "2024-12-31"))
invisible(getSymbols("BTC-USD", src = "yahoo", from = "2015-01-01", to = "2024-12-31"))

xau <- read.csv("./data/xau.csv")
xau$Date <- as.Date(xau$Date)

gsci <- read.csv("./data/gsci.csv")[2:2518,]
gsci$Date <- as.Date(gsci$Date)

house <- read.csv("./data/house.csv", header=TRUE)
house$time <- as.Date(house$time)

visitors <- read.csv("./data/visitors.csv", header=TRUE)
visitors$time <- as.Date(visitors$time)

egg <- read.csv("./data/eggs_price.csv")
egg$Date <- as.Date(egg$Date)

ir <- read.csv("./data/interest_rate.csv")
ir$Date <- as.Date(ir$Date)

oil <- read.csv("./data/oil_price.csv")
oil$Date <- as.Date(oil$Date)

# time series
dxy_ts <- ts(log(dxy$Close), start=c(2005,1), frequency=252)
balance_ts <- ts(bea$balance, start=c(2005,1), end=c(2024,3), frequency=4)
gdp_ts <- ts(gdp$total, start=c(2005,1), end=c(2024,3), frequency=4)
unem_ts <- ts(log(data_unem$unem), start=c(2005,1), end=c(2023,12), frequency=12)
cpi_ts <- ts(data_cpi$cpi, start=c(2005,1), end=c(2023,12), frequency=12)
sp5_ts <- ts(GSPC$GSPC.Close, start=c(2005,1), frequency=252)
xau_ts <- ts(log(xau$Price), start=c(2005,1), end=c(2024,52), frequency=52)
oil_ts <- ts(log(oil$Oil), start=c(2005,1), end=c(2024,12), frequency=12)
gsci_ts <- ts(log(gsci$Price), start=c(2014,252), frequency=252)
house_ts <- ts(house$index, start=c(2005,1), end=c(2024,4), frequency=4)
visitors_ts <- ts(log(visitors$count), start=c(2005,1), end=c(2024,12), frequency=12)
btc_ts <- ts(log(`BTC-USD`[,"BTC-USD.Close"]), start=c(2015,1), frequency=252)
egg_ts <- ts(log(egg$Price), start=c(2005,1), end=c(2024,12), frequency=12)
ir_ts <- ts(ir$IR, start=c(2005,1), end=c(2024,12), frequency=12)

set.seed(123)
library(kableExtra)

# ARIMA funtion
ARIMA.c = function(p1, p2, q1, q2, data) {
  d = 1
  i = 1
  temp = data.frame()
  ls = matrix(rep(NA, 6 * 100), nrow = 100)
  
  for (p in p1:p2) {
    for (q in q1:q2) {
          if (p + d + q <= 9) {
            
            model <- tryCatch({
              Arima(data, order = c(p, d, q), include.drift = TRUE)
            }, error = function(e) {
              return(NULL)
            })
            
            if (!is.null(model)) {
              ls[i, ] = c(p, d, q, model$aic, model$bic, model$aicc)
              i = i + 1
            }
          }
        }
      }
      temp = as.data.frame(ls)
      names(temp) = c("p", "d", "q", "AIC", "BIC", "AICc")
      temp = na.omit(temp)
      return(temp)
}

# SARIMA function
SARIMA.c = function(p1, p2, q1, q2, P1, P2, Q1, Q2, s, data) {
  d = 1
  D = 1
  i = 1
  temp = data.frame()
  ls = matrix(rep(NA, 9 * 100), nrow = 100)
  
  for (p in p1:p2) {
    for (q in q1:q2) {
      for (P in P1:P2) {
        for (Q in Q1:Q2) {
          if (p + d + q + P + D + Q <= 9) {
            
            model <- tryCatch({
              Arima(data, order = c(p, d, q), seasonal = list(order = c(P,D,Q), period = s))
            }, error = function(e) {
              return(NULL)
            })
            
            if (!is.null(model)) {
              ls[i, ] = c(p, d, q, P, D, Q, model$aic, model$bic, model$aicc)
              i = i + 1
            }
          }
        }
      }
    }
  }
  
  temp = as.data.frame(ls)
  names(temp) = c("p", "d", "q", "P", "D", "Q", "AIC", "BIC", "AICc")
  temp = na.omit(temp)
  return(temp)
}

highlight_output = function(output, type="ARIMA") {
    highlight_row <- c(which.min(output$AIC), which.min(output$BIC), which.min(output$AICc))
    knitr::kable(output, align = 'c', caption = paste("Comparison of", type, "Models")) %>%
    kable_styling(full_width = FALSE, position = "center") %>%
    row_spec(highlight_row, bold = TRUE, background = "#FFFF99")  # Highlight row in yellow
}

# Define a function to fit SARIMA and handle errors
fit_sarima <- function(xtrain, p, d, q, P, D, Q, s) {
  fit <- tryCatch({
    Arima(xtrain, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = s),
          include.drift = FALSE, lambda = 0, method = "ML")
  }, error = function(e) {
    return(NULL)  # Return NULL if an error occurs
  })
  return(fit)  # Return the fitted model (or NULL)
}

VAR: USD ~ Trade Deficit + Global Commodity Price + CPI

Here, we will analyze the interactions between the USD, trade deficit, global commodity prices, and CPI.

Code
dxy_m <- dxy %>%
  mutate(month = floor_date(Date, "month")) %>% 
  group_by(month) %>%
  summarise(Close = mean(Close, na.rm = TRUE))
dxy_q <- dxy %>%
  mutate(quarter = floor_date(Date, "quarter")) %>% 
  group_by(quarter) %>%
  summarise(Close = mean(Close, na.rm = TRUE))
gsci_q <- gsci %>%
  mutate(quarter = floor_date(Date, "quarter")) %>%
  group_by(quarter) %>%
  summarise(gsci = mean(Price, na.rm = TRUE))
cpi_q <- data_cpi %>%
  mutate(quarter = floor_date(time, "quarter")) %>%
  group_by(quarter) %>%
  summarise(cpi = mean(cpi, na.rm = TRUE))

df1 <- data.frame(Date = gsci_q$quarter, 
                  dxy = log(dxy_q$Close)[41:80], 
                  deficit = log(abs(bea$balance))[41:80],
                  gsci = log(gsci_q$gsci),
                  cpi = log(cpi_q$cpi)[41:80]
                  )

plot_usd <- plot_ly(df1, x = ~Date, y = ~dxy, type = 'scatter', mode = 'lines', name = 'U.S. Dollar')
plot_deficit <- plot_ly(df1, x = ~Date, y = ~deficit, type = 'scatter', mode = 'lines', name = 'Trade Deficit') 
plot_gsci <- plot_ly(df1, x = ~Date, y = ~gsci, type = 'scatter', mode = 'lines', name = 'Commodity Price') 
plot_cpi <- plot_ly(df1, x = ~Date, y = ~cpi, type = 'scatter', mode = 'lines', name = 'CPI') 

subplot(plot_usd, plot_deficit, plot_gsci, plot_cpi, nrows = 4, shareX = TRUE) %>%
  layout(title = "Trend of U.S. Dollar and Related Variables", showlegend = FALSE,
    xaxis = list(title = 'Date'),
    yaxis = list(title = 'U.S. Dollar'),
    yaxis2 = list(title = 'Trade Deficit'),
    yaxis3 = list(title = 'Commodity Price'),
    yaxis4 = list(title = 'CPI'))
Code
ts_df1 <- ts(df1[,-1], start = c(2015,1), frequency = 4)
VARselect(ts_df1, lag.max=6, type="both") 
$selection
AIC(n)  HQ(n)  SC(n) FPE(n) 
     6      6      1      2 

$criteria
                   1             2             3             4             5
AIC(n) -2.787866e+01 -2.851355e+01 -2.862475e+01 -2.831302e+01 -2.893424e+01
HQ(n)  -2.751123e+01 -2.790116e+01 -2.776740e+01 -2.721072e+01 -2.758698e+01
SC(n)  -2.680123e+01 -2.671783e+01 -2.611074e+01 -2.508073e+01 -2.498366e+01
FPE(n)  7.923879e-13  4.444468e-13  4.556769e-13  8.165450e-13  7.276663e-13
                   6
AIC(n) -2.997350e+01
HQ(n)  -2.838128e+01
SC(n)  -2.530463e+01
FPE(n)  6.699754e-13

Based on the results, the lag length could be 1, 2, or 6.

Code
summary(fit <- VAR(ts_df1, p=1, type="both"))

VAR Estimation Results:
========================= 
Endogenous variables: dxy, deficit, gsci, cpi 
Deterministic variables: both 
Sample size: 39 
Log Likelihood: 342.132 
Roots of the characteristic polynomial:
0.8721 0.8113 0.8113 0.3509
Call:
VAR(y = ts_df1, p = 1, type = "both")


Estimation results for equation dxy: 
==================================== 
dxy = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
dxy.l1      0.667614   0.124027   5.383 5.96e-06 ***
deficit.l1  0.004326   0.026047   0.166   0.8691    
gsci.l1     0.069535   0.029550   2.353   0.0247 *  
cpi.l1      0.248191   0.232994   1.065   0.2945    
const      -0.289282   0.914592  -0.316   0.7538    
trend      -0.002214   0.001532  -1.445   0.1578    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02053 on 33 degrees of freedom
Multiple R-Squared: 0.8444, Adjusted R-squared: 0.8208 
F-statistic: 35.81 on 5 and 33 DF,  p-value: 2.07e-12 


Estimation results for equation deficit: 
======================================== 
deficit = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)  
dxy.l1     -1.498555   0.747045  -2.006   0.0531 .
deficit.l1  0.421178   0.156890   2.685   0.0113 *
gsci.l1     0.030080   0.177986   0.169   0.8668  
cpi.l1      1.097922   1.403382   0.782   0.4396  
const       7.397317   5.508813   1.343   0.1885  
trend       0.005873   0.009226   0.637   0.5288  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.1237 on 33 degrees of freedom
Multiple R-Squared: 0.8249, Adjusted R-squared: 0.7984 
F-statistic: 31.09 on 5 and 33 DF,  p-value: 1.403e-11 


Estimation results for equation gsci: 
===================================== 
gsci = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
dxy.l1     -1.150435   0.516991  -2.225    0.033 *  
deficit.l1  0.171882   0.108576   1.583    0.123    
gsci.l1     0.703631   0.123175   5.712 2.25e-06 ***
cpi.l1      0.810695   0.971207   0.835    0.410    
const       0.570008   3.812359   0.150    0.882    
trend      -0.002896   0.006385  -0.454    0.653    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.08559 on 33 degrees of freedom
Multiple R-Squared: 0.8813, Adjusted R-squared: 0.8633 
F-statistic: 48.98 on 5 and 33 DF,  p-value: 2.532e-14 


Estimation results for equation cpi: 
==================================== 
cpi = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + const + trend 

             Estimate Std. Error t value Pr(>|t|)    
dxy.l1     -0.0531202  0.0384176  -1.383   0.1760    
deficit.l1 -0.0003969  0.0080682  -0.049   0.9611    
gsci.l1     0.0180458  0.0091531   1.972   0.0571 .  
cpi.l1      0.9359941  0.0721704  12.969 1.67e-14 ***
const       0.4906491  0.2832965   1.732   0.0926 .  
trend       0.0005609  0.0004745   1.182   0.2456    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.00636 on 33 degrees of freedom
Multiple R-Squared: 0.9962, Adjusted R-squared: 0.9957 
F-statistic:  1741 on 5 and 33 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
               dxy    deficit       gsci        cpi
dxy      4.217e-04 -0.0001257 -0.0003646 -1.961e-05
deficit -1.257e-04  0.0152973  0.0011320  2.837e-04
gsci    -3.646e-04  0.0011320  0.0073263  4.279e-04
cpi     -1.961e-05  0.0002837  0.0004279  4.046e-05

Correlation matrix of residuals:
            dxy deficit    gsci     cpi
dxy      1.0000 -0.0495 -0.2075 -0.1501
deficit -0.0495  1.0000  0.1069  0.3606
gsci    -0.2075  0.1069  1.0000  0.7860
cpi     -0.1501  0.3606  0.7860  1.0000
Code
summary(fit <- VAR(ts_df1, p=2, type="both"))

VAR Estimation Results:
========================= 
Endogenous variables: dxy, deficit, gsci, cpi 
Deterministic variables: both 
Sample size: 38 
Log Likelihood: 359.881 
Roots of the characteristic polynomial:
0.8411 0.7931 0.7931 0.7212 0.7212 0.5164 0.5164 0.04401
Call:
VAR(y = ts_df1, p = 2, type = "both")


Estimation results for equation dxy: 
==================================== 
dxy = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + const + trend 

            Estimate Std. Error t value Pr(>|t|)    
dxy.l1      0.860943   0.176984   4.865 4.02e-05 ***
deficit.l1 -0.034702   0.031732  -1.094   0.2835    
gsci.l1    -0.049823   0.069952  -0.712   0.4822    
cpi.l1      1.999967   0.992855   2.014   0.0537 .  
dxy.l2     -0.374902   0.180288  -2.079   0.0468 *  
deficit.l2  0.034468   0.033392   1.032   0.3108    
gsci.l2     0.034555   0.050941   0.678   0.5031    
cpi.l2     -1.375498   0.904893  -1.520   0.1397    
const      -0.959290   1.024171  -0.937   0.3569    
trend      -0.003810   0.001726  -2.208   0.0356 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.02 on 28 degrees of freedom
Multiple R-Squared: 0.8741, Adjusted R-squared: 0.8337 
F-statistic: 21.61 on 9 and 28 DF,  p-value: 2.388e-10 


Estimation results for equation deficit: 
======================================== 
deficit = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + const + trend 

             Estimate Std. Error t value Pr(>|t|)    
dxy.l1      -0.634374   0.852994  -0.744 0.463252    
deficit.l1   0.353624   0.152935   2.312 0.028333 *  
gsci.l1     -0.698797   0.337141  -2.073 0.047518 *  
cpi.l1      17.998184   4.785184   3.761 0.000794 ***
dxy.l2      -0.639547   0.868918  -0.736 0.467836    
deficit.l2  -0.189120   0.160935  -1.175 0.249838    
gsci.l2      0.376337   0.245519   1.533 0.136542    
cpi.l2     -16.702649   4.361243  -3.830 0.000662 ***
const       10.258999   4.936119   2.078 0.046958 *  
trend        0.010970   0.008318   1.319 0.197905    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.09637 on 28 degrees of freedom
Multiple R-Squared: 0.9071, Adjusted R-squared: 0.8773 
F-statistic: 30.39 on 9 and 28 DF,  p-value: 3.822e-12 


Estimation results for equation gsci: 
===================================== 
gsci = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + const + trend 

            Estimate Std. Error t value Pr(>|t|)  
dxy.l1     -1.030868   0.765116  -1.347   0.1887  
deficit.l1  0.109221   0.137179   0.796   0.4326  
gsci.l1     0.642615   0.302408   2.125   0.0425 *
cpi.l1      1.223680   4.292199   0.285   0.7777  
dxy.l2     -0.074839   0.779399  -0.096   0.9242  
deficit.l2  0.192069   0.144355   1.331   0.1941  
gsci.l2    -0.080255   0.220224  -0.364   0.7183  
cpi.l2     -0.244303   3.911934  -0.062   0.9506  
const      -1.235445   4.427584  -0.279   0.7823  
trend      -0.004332   0.007461  -0.581   0.5662  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.08644 on 28 degrees of freedom
Multiple R-Squared: 0.8971, Adjusted R-squared: 0.864 
F-statistic: 27.13 on 9 and 28 DF,  p-value: 1.546e-11 


Estimation results for equation cpi: 
==================================== 
cpi = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + const + trend 

             Estimate Std. Error t value Pr(>|t|)    
dxy.l1     -0.0345719  0.0532750  -0.649   0.5217    
deficit.l1 -0.0123101  0.0095518  -1.289   0.2080    
gsci.l1    -0.0072755  0.0210566  -0.346   0.7323    
cpi.l1      1.3978230  0.2988658   4.677  6.7e-05 ***
dxy.l2     -0.0155966  0.0542696  -0.287   0.7759    
deficit.l2  0.0241018  0.0100515   2.398   0.0234 *  
gsci.l2    -0.0009416  0.0153342  -0.061   0.9515    
cpi.l2     -0.4129302  0.2723880  -1.516   0.1407    
const       0.2221791  0.3082927   0.721   0.4771    
trend       0.0002623  0.0005195   0.505   0.6175    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.006019 on 28 degrees of freedom
Multiple R-Squared: 0.997,  Adjusted R-squared: 0.9961 
F-statistic:  1040 on 9 and 28 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
               dxy    deficit       gsci        cpi
dxy      3.998e-04 -0.0005604 -0.0004343 -3.436e-05
deficit -5.604e-04  0.0092876  0.0020977  2.978e-04
gsci    -4.343e-04  0.0020977  0.0074725  4.054e-04
cpi     -3.436e-05  0.0002978  0.0004054  3.623e-05

Correlation matrix of residuals:
            dxy deficit    gsci     cpi
dxy      1.0000 -0.2908 -0.2512 -0.2855
deficit -0.2908  1.0000  0.2518  0.5134
gsci    -0.2512  0.2518  1.0000  0.7791
cpi     -0.2855  0.5134  0.7791  1.0000
Code
summary(fit <- VAR(ts_df1, p=6, type="both"))

VAR Estimation Results:
========================= 
Endogenous variables: dxy, deficit, gsci, cpi 
Deterministic variables: both 
Sample size: 34 
Log Likelihood: 420.574 
Roots of the characteristic polynomial:
0.9774 0.9774 0.9612 0.9612 0.9551 0.9537 0.9537 0.9456 0.9456 0.914 0.8946 0.8946 0.8791 0.8791 0.874 0.874 0.8692 0.8692 0.8455 0.8455 0.8179 0.8179 0.4034 0.04809
Call:
VAR(y = ts_df1, p = 6, type = "both")


Estimation results for equation dxy: 
==================================== 
dxy = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + dxy.l3 + deficit.l3 + gsci.l3 + cpi.l3 + dxy.l4 + deficit.l4 + gsci.l4 + cpi.l4 + dxy.l5 + deficit.l5 + gsci.l5 + cpi.l5 + dxy.l6 + deficit.l6 + gsci.l6 + cpi.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)  
dxy.l1      0.188522   0.278061   0.678   0.5169  
deficit.l1 -0.114187   0.063291  -1.804   0.1089  
gsci.l1    -0.088524   0.097296  -0.910   0.3895  
cpi.l1      3.503156   1.427231   2.455   0.0397 *
dxy.l2      0.063782   0.272373   0.234   0.8207  
deficit.l2 -0.023496   0.063565  -0.370   0.7212  
gsci.l2     0.131369   0.096624   1.360   0.2110  
cpi.l2     -1.958538   2.251024  -0.870   0.4096  
dxy.l3     -0.447982   0.274821  -1.630   0.1417  
deficit.l3 -0.138874   0.067914  -2.045   0.0751 .
gsci.l3    -0.110289   0.095874  -1.150   0.2832  
cpi.l3      0.835259   1.921125   0.435   0.6752  
dxy.l4     -0.251535   0.275764  -0.912   0.3884  
deficit.l4  0.002074   0.066215   0.031   0.9758  
gsci.l4    -0.171482   0.095911  -1.788   0.1116  
cpi.l4      0.707427   1.943709   0.364   0.7253  
dxy.l5     -0.320172   0.227445  -1.408   0.1969  
deficit.l5 -0.007782   0.056297  -0.138   0.8935  
gsci.l5     0.084407   0.081619   1.034   0.3313  
cpi.l5     -1.363133   1.779101  -0.766   0.4656  
dxy.l6     -0.459738   0.277596  -1.656   0.1363  
deficit.l6  0.083523   0.053758   1.554   0.1589  
gsci.l6     0.059224   0.061525   0.963   0.3639  
cpi.l6      0.465138   1.282731   0.363   0.7263  
const       1.118637   1.802790   0.621   0.5522  
trend      -0.008372   0.002630  -3.184   0.0129 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.01492 on 8 degrees of freedom
Multiple R-Squared: 0.9795, Adjusted R-squared: 0.9156 
F-statistic: 15.33 on 25 and 8 DF,  p-value: 0.0002289 


Estimation results for equation deficit: 
======================================== 
deficit = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + dxy.l3 + deficit.l3 + gsci.l3 + cpi.l3 + dxy.l4 + deficit.l4 + gsci.l4 + cpi.l4 + dxy.l5 + deficit.l5 + gsci.l5 + cpi.l5 + dxy.l6 + deficit.l6 + gsci.l6 + cpi.l6 + const + trend 

             Estimate Std. Error t value Pr(>|t|)  
dxy.l1      -0.669424   1.194355  -0.560   0.5905  
deficit.l1   0.469621   0.271852   1.727   0.1223  
gsci.l1     -0.750854   0.417916  -1.797   0.1101  
cpi.l1      10.635835   6.130392   1.735   0.1210  
dxy.l2      -1.644347   1.169923  -1.406   0.1975  
deficit.l2  -0.507760   0.273033  -1.860   0.1000 .
gsci.l2     -0.263142   0.415029  -0.634   0.5438  
cpi.l2      -3.749152   9.668833  -0.388   0.7083  
dxy.l3      -0.794137   1.180438  -0.673   0.5201  
deficit.l3   0.380241   0.291713   1.303   0.2287  
gsci.l3     -0.314116   0.411807  -0.763   0.4675  
cpi.l3       9.897744   8.251813   1.199   0.2647  
dxy.l4       0.635921   1.184490   0.537   0.6060  
deficit.l4  -0.118016   0.284415  -0.415   0.6891  
gsci.l4      0.557226   0.411966   1.353   0.2132  
cpi.l4      -7.712312   8.348820  -0.924   0.3826  
dxy.l5      -1.962701   0.976947  -2.009   0.0794 .
deficit.l5  -0.203957   0.241814  -0.843   0.4235  
gsci.l5     -0.786869   0.350578  -2.244   0.0550 .
cpi.l5       8.431582   7.641778   1.103   0.3020  
dxy.l6       0.903502   1.192359   0.758   0.4703  
deficit.l6  -0.219157   0.230905  -0.949   0.3703  
gsci.l6      0.233288   0.264270   0.883   0.4031  
cpi.l6     -11.240777   5.509717  -2.040   0.0757 .
const        3.608153   7.743532   0.466   0.6537  
trend       -0.006603   0.011295  -0.585   0.5749  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Residual standard error: 0.06409 on 8 degrees of freedom
Multiple R-Squared: 0.986,  Adjusted R-squared: 0.9422 
F-statistic: 22.51 on 25 and 8 DF,  p-value: 5.357e-05 


Estimation results for equation gsci: 
===================================== 
gsci = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + dxy.l3 + deficit.l3 + gsci.l3 + cpi.l3 + dxy.l4 + deficit.l4 + gsci.l4 + cpi.l4 + dxy.l5 + deficit.l5 + gsci.l5 + cpi.l5 + dxy.l6 + deficit.l6 + gsci.l6 + cpi.l6 + const + trend 

            Estimate Std. Error t value Pr(>|t|)
dxy.l1      -0.02834    1.99586  -0.014    0.989
deficit.l1   0.51395    0.45429   1.131    0.291
gsci.l1      0.22994    0.69837   0.329    0.750
cpi.l1       0.49998   10.24435   0.049    0.962
dxy.l2      -1.49073    1.95503  -0.763    0.468
deficit.l2   0.34165    0.45626   0.749    0.475
gsci.l2     -0.11959    0.69355  -0.172    0.867
cpi.l2      -0.06070   16.15735  -0.004    0.997
dxy.l3       0.30408    1.97260   0.154    0.881
deficit.l3   0.29492    0.48748   0.605    0.562
gsci.l3      0.28301    0.68816   0.411    0.692
cpi.l3       1.89453   13.78941   0.137    0.894
dxy.l4       1.40984    1.97937   0.712    0.497
deficit.l4   0.12161    0.47528   0.256    0.805
gsci.l4     -0.01759    0.68843  -0.026    0.980
cpi.l4       2.31895   13.95151   0.166    0.872
dxy.l5      -0.42458    1.63255  -0.260    0.801
deficit.l5  -0.09938    0.40409  -0.246    0.812
gsci.l5      0.26122    0.58584   0.446    0.667
cpi.l5      -6.75397   12.76999  -0.529    0.611
dxy.l6       1.41084    1.99252   0.708    0.499
deficit.l6  -0.15247    0.38586  -0.395    0.703
gsci.l6     -0.12111    0.44162  -0.274    0.791
cpi.l6       2.57117    9.20715   0.279    0.787
const      -16.81991   12.94003  -1.300    0.230
trend       -0.02132    0.01887  -1.130    0.291


Residual standard error: 0.1071 on 8 degrees of freedom
Multiple R-Squared: 0.9443, Adjusted R-squared: 0.7703 
F-statistic: 5.426 on 25 and 8 DF,  p-value: 0.009051 


Estimation results for equation cpi: 
==================================== 
cpi = dxy.l1 + deficit.l1 + gsci.l1 + cpi.l1 + dxy.l2 + deficit.l2 + gsci.l2 + cpi.l2 + dxy.l3 + deficit.l3 + gsci.l3 + cpi.l3 + dxy.l4 + deficit.l4 + gsci.l4 + cpi.l4 + dxy.l5 + deficit.l5 + gsci.l5 + cpi.l5 + dxy.l6 + deficit.l6 + gsci.l6 + cpi.l6 + const + trend 

             Estimate Std. Error t value Pr(>|t|)
dxy.l1      0.0447305  0.1207314   0.370    0.721
deficit.l1  0.0058585  0.0274802   0.213    0.837
gsci.l1    -0.0045955  0.0422451  -0.109    0.916
cpi.l1      0.7886579  0.6196909   1.273    0.239
dxy.l2     -0.1248322  0.1182617  -1.056    0.322
deficit.l2  0.0182774  0.0275995   0.662    0.526
gsci.l2    -0.0156250  0.0419533  -0.372    0.719
cpi.l2     -0.1946339  0.9773744  -0.199    0.847
dxy.l3     -0.0129803  0.1193246  -0.109    0.916
deficit.l3  0.0165136  0.0294879   0.560    0.591
gsci.l3    -0.0166860  0.0416275  -0.401    0.699
cpi.l3      0.8414897  0.8341348   1.009    0.343
dxy.l4      0.0505925  0.1197342   0.423    0.684
deficit.l4  0.0201891  0.0287501   0.702    0.502
gsci.l4    -0.0125725  0.0416436  -0.302    0.770
cpi.l4      0.0383578  0.8439408   0.045    0.965
dxy.l5     -0.0194012  0.0987547  -0.196    0.849
deficit.l5  0.0053613  0.0244438   0.219    0.832
gsci.l5     0.0282938  0.0354382   0.798    0.448
cpi.l5     -0.4281220  0.7724694  -0.554    0.595
dxy.l6      0.0370442  0.1205297   0.307    0.766
deficit.l6 -0.0127183  0.0233411  -0.545    0.601
gsci.l6    -0.0277421  0.0267137  -1.038    0.329
cpi.l6      0.0367840  0.5569500   0.066    0.949
const      -0.6637172  0.7827552  -0.848    0.421
trend      -0.0007543  0.0011417  -0.661    0.527


Residual standard error: 0.006479 on 8 degrees of freedom
Multiple R-Squared: 0.9988, Adjusted R-squared: 0.9951 
F-statistic: 267.1 on 25 and 8 DF,  p-value: 3.206e-09 



Covariance matrix of residuals:
               dxy   deficit       gsci       cpi
dxy      2.227e-04 5.909e-05 -4.855e-05 9.372e-06
deficit  5.909e-05 4.108e-03  2.520e-03 1.162e-04
gsci    -4.855e-05 2.520e-03  1.147e-02 6.200e-04
cpi      9.372e-06 1.162e-04  6.200e-04 4.198e-05

Correlation matrix of residuals:
             dxy deficit     gsci     cpi
dxy      1.00000 0.06178 -0.03038 0.09694
deficit  0.06178 1.00000  0.36701 0.27992
gsci    -0.03038 0.36701  1.00000 0.89346
cpi      0.09694 0.27992  0.89346 1.00000

The model with p=1 or p=2 looks good. However, only a few variables are significant in the model with p=6, suggesting that VAR(6) may not be the best choice.

Code
fun.var <- function(ts, year, p, s){
  fit <- VAR(ts, p=p, type='both')
  fcast <- predict(fit, n.ahead = s)
  
  f1<-fcast$fcst$dxy
  f2<-fcast$fcst$deficit
  f3<-fcast$fcst$gsci
  f4<-fcast$fcst$cpi
  ff<-data.frame(f1[,1],f2[,1],f3[,1],f4[,1])
  ff<-ts(ff,start=c(year,1),frequency = s)
  return(ff)
}

data <- ts_df1
n=nrow(data)
n_var = ncol(data)
h <- 4  # h: Forecast horizon
# k: Initial training set
# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12
k <- floor(n / 3 / h) * h
num_iter <- (n - k) / h  # Number of rolling iterations

# Initialize matrices for RMSE
rmse1 <- matrix(NA, nrow = n-k, ncol = n_var)  # RMSE for Model 1
rmse2 <- matrix(NA, nrow = n-k, ncol = n_var)  # RMSE for Model 2

# Define rolling start time
st <- tsp(data)[1] + (k - 1) / h 

# Walk-Forward Validation Loop
for (i in 1:num_iter) {
  xtrain <- window(data, end = st + i - 1)
  xtest <- window(data, start = st + (i - 1) + 1/h, end = st + i)  # Test set for the next 12 months
  test_start_year <- st + (i-1) + 1/h #starting year for predication, i.e. xtest

  ######## VAR(1) ############
  ff1 <- fun.var(xtrain, test_start_year, p=1, s=h)
  ######## VAR(2) ############
  ff2 <- fun.var(xtrain, test_start_year, p=2, s=h)
  
  ##### collecting errors ######
  a = h*i-h+1
  b= h*i
  rmse1[c(a:b),]  <-abs(ff1-xtest)
  rmse2[c(a:b),]  <-abs(ff2-xtest)
}

rmse_combined <- as.data.frame(rbind(rmse1, rmse2))
colnames(rmse_combined) = c("usd","deficit","gsci","cpi")
rmse_combined$Model <- c(rep("VAR(1)", n-k),rep("VAR(2)", n-k))
rmse_combined$Date <- time(data)[(k+1):n]

# Create the USD RMSE plot with a legend
ggplot(data = rmse_combined, aes(x = Date, y = usd, color = Model)) + 
  geom_line() +
  labs(
    title = "CV Error for USD",
    x = "Date",
    y = "Error",
    color = "Model"
  ) +
  theme_minimal()

VAR(1) is slightly better.

Code
# Fit a VAR(1) model including both a constant and trend
fit <- VAR(data, p = 1, type = "both")

forecast(fit) %>%
  autoplot() + xlab("Year")

I believe the forecast is very good. The VAR(1) model has effectively captured the pattern of four varibales.

VAR: USD ~ Interest Rate + CPI + Unemployment Rate + GDP

VAR: USD ~ Stock Price + Bitcoin Price

VAR: USD ~ Oil Price + Gold Price

SARIMAX: USD ~ Egg Price

In this analysis, we will explore how the domestic market, specifically egg prices, affects the USD.

Code
df5 <- data.frame(Date = dxy_m$month, dxy = log(dxy_m$Close), egg = log(egg$Price))

plot_usd <- plot_ly(df5, x = ~Date, y = ~dxy, type = 'scatter', mode = 'lines', name = 'U.S. Dollar')
plot_egg <- plot_ly(df5, x = ~Date, y = ~egg, type = 'scatter', mode = 'lines', name = 'Egg Price') 

subplot(plot_usd, plot_egg, nrows = 2, shareX = TRUE) %>%
  layout(title = "Trend of U.S. Dollar and Egg Price", showlegend = FALSE,
    xaxis = list(title = 'Date'),
    yaxis = list(title = 'U.S. Dollar'),
    yaxis2 = list(title = 'Egg Price'))

The trend of the USD and egg prices is very similar. Thus, we can fit an ARIMAX model using the USD as the dependent variable and egg prices as the exogenous variable.

Code
ts_df5 <- ts(df5, start = c(2005, 1), frequency = 12)
auto.arima(ts_df5[,"dxy"], xreg = ts_df5[,"egg"])
Series: ts_df5[, "dxy"] 
Regression with ARIMA(1,1,2)(2,0,1)[12] errors 

Coefficients:
          ar1     ma1     ma2    sar1     sar2    sma1    xreg
      -0.9684  1.2983  0.3595  -0.704  -0.1496  0.5654  0.0019
s.e.   0.0264  0.0678  0.0635   0.449   0.0765  0.4505  0.0125

sigma^2 = 0.0002711:  log likelihood = 645.57
AIC=-1275.13   AICc=-1274.5   BIC=-1247.32
Code
# Fit a linear model
lm_fit.5 <- lm(dxy ~ egg, data = df5)
res.5 <- ts(residuals(lm_fit.5), start = c(2005, 1), frequency = 12)

# ACF and PACF plots of the residuals
ggtsdisplay(diff(res.5), main = "Differenced residuals") # p=0:1, d=1, q=0:1, P=0:3, D=1, Q=0:1

Code
# Manual search
output=SARIMA.c(p1=0,p2=1,q1=0,q2=1,P1=0,P2=3,Q1=0,Q2=1,s=12,data=res.5)
highlight_output(output)
Comparison of ARIMA Models
p d q P D Q AIC BIC AICc
0 1 0 0 1 0 -955.7432 -952.3182 -955.7254
0 1 0 0 1 1 -1097.7315 -1090.8816 -1097.6779
0 1 0 1 1 0 -1028.9996 -1022.1497 -1028.9460
0 1 0 1 1 1 -1097.9083 -1087.6334 -1097.8007
0 1 0 2 1 0 -1068.5904 -1058.3155 -1068.4828
0 1 0 2 1 1 -1095.9367 -1082.2369 -1095.7565
0 1 0 3 1 0 -1071.0916 -1057.3918 -1070.9114
0 1 0 3 1 1 -1095.7146 -1078.5899 -1095.4431
0 1 1 0 1 0 -972.4749 -965.6250 -972.4213
0 1 1 0 1 1 -1108.1003 -1097.8254 -1107.9927
0 1 1 1 1 0 -1045.4959 -1035.2210 -1045.3882
0 1 1 1 1 1 -1107.3199 -1093.6201 -1107.1398
0 1 1 2 1 0 -1076.4356 -1062.7358 -1076.2554
0 1 1 2 1 1 -1105.3775 -1088.2528 -1105.1060
0 1 1 3 1 0 -1078.2845 -1061.1598 -1078.0130
0 1 1 3 1 1 -1104.3276 -1083.7779 -1103.9458
1 1 0 0 1 0 -975.9562 -969.1063 -975.9027
1 1 0 0 1 1 -1110.0948 -1099.8199 -1109.9872
1 1 0 1 1 0 -1047.1080 -1036.8331 -1047.0003
1 1 0 1 1 1 -1109.1529 -1095.4531 -1108.9727
1 1 0 2 1 0 -1077.3774 -1063.6776 -1077.1972
1 1 0 2 1 1 -1107.1979 -1090.0731 -1106.9264
1 1 0 3 1 0 -1079.5166 -1062.3919 -1079.2451
1 1 0 3 1 1 -1105.6125 -1085.0628 -1105.2306
1 1 1 0 1 0 -974.5777 -964.3029 -974.4701
1 1 1 0 1 1 -1108.5119 -1094.8121 -1108.3317
1 1 1 1 1 0 -1045.1624 -1031.4626 -1044.9822
1 1 1 1 1 1 -1107.5445 -1090.4198 -1107.2730
1 1 1 2 1 0 -1075.5337 -1058.4089 -1075.2622
1 1 1 2 1 1 -1105.5719 -1085.0222 -1105.1901
1 1 1 3 1 0 -1077.8525 -1057.3028 -1077.4707
1 1 1 3 1 1 -1104.2687 -1080.2940 -1103.7573
Code
# Model Diagnostics
model_output <- capture.output(sarima(res.5, 1,1,0,0,1,1,12))

Code
start_line <- grep("Coefficients", model_output)
end_line <- length(model_output)
cat(model_output[start_line:end_line], sep = "\n")
Coefficients: 
     Estimate     SE  t.value p.value
ar1    0.2471 0.0641   3.8530   2e-04
sma1  -1.0000 0.0612 -16.3362   0e+00

sigma^2 estimated as 0.0003659517 on 225 degrees of freedom 
 
AIC = -4.890285  AICc = -4.890049  BIC = -4.845022 
 

The best model for the residuals of the linear model is ARIMA(1,1,0)(0,1,1)[12].

The Residual Plot of the ARIMA model shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.

The Autocorrelation Function (ACF) of the residuals shows mostly independence.

The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.

The Ljung-Box Test p-values are mostly above the 0.05 significance level, implying that a few autocorrelations are left in the residuals.

All coefficients are significant at the 5% significant level.

Code
# Define parameters
data <- ts_df5
n <- nrow(data)  # Total observations
h <- 12  # h: Forecast horizon
# k: Initial training set
# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 12
k <- floor(n / 3 / h) * h
num_iter <- (n - k) / h  # Number of rolling iterations

# Initialize matrices for RMSE
rmse1 <- matrix(NA, nrow = num_iter, ncol = h)  # RMSE for Model 1
rmse2 <- matrix(NA, nrow = num_iter, ncol = h)  # RMSE for Model 2

# Define rolling start time
st <- tsp(data)[1] + (k - 1) / h 

# Walk-Forward Validation Loop
for (i in 1:num_iter) {
  xtrain <- window(data, end = st + i - 1)
  xtest <- window(data, start = st + (i - 1) + 1/h, end = st + i)  # Test set for the next 12 months

  # Fit auto.arima() model
  model.1 <- Arima(xtrain[,"dxy"], order = c(1,1,2), 
                  seasonal = list(order = c(2,0,1), period = 12),
                  xreg = xtrain[, "egg"],
                  optim.control = list(maxit = 10000))   
  f.1 <-   forecast(model.1, xreg = xtest[, "egg"], h = h)
  rmse1[i,] <-  (f.1$mean - xtest[,"dxy"])^2

  ###### Fit manual model
  model.2 <- Arima(xtrain[, "dxy"], order = c(1,1,0),
                  seasonal = list(order = c(0,1,1), period = 12),
                  xreg = xtrain[, "egg"],
                  optim.control = list(maxit = 10000))
  f.2 <- forecast(model.2, xreg = xtest[, "egg"], h = h)
  rmse2[i,] <-  (f.2$mean - xtest[,"dxy"])^2
}   

# Compute RMSE across all iterations
rmse1_avg <- sqrt(colMeans(rmse1, na.rm = TRUE))
rmse2_avg <- sqrt(colMeans(rmse2, na.rm = TRUE))

# Create a DataFrame for better visualization
error_table <- data.frame(
    Horizon = 1:h,
    RMSE_Model1 = rmse1_avg,
    RMSE_Model2 = rmse2_avg
)

# **Improved RMSE Plot using ggplot2**
ggplot(error_table, aes(x = Horizon)) +
  geom_line(aes(y = RMSE_Model1, color = "Regression with ARIMA(1,1,2)(2,0,1)[12] errors"), size = 1) +
  geom_line(aes(y = RMSE_Model2, color = "Regression with ARIMA(1,1,0)(0,1,1)[12] errors"), size = 1) +
  labs(title = "RMSE Comparison for 12-Step Forecasts",
       x = "Forecast Horizon (Months Ahead)",
       y = "Root Mean Squared Error (RMSE)") +
  scale_color_manual(name = "Models", values = c("red", "blue")) +
  theme_minimal()

ARIMA(1,1,2)(2,0,1)[12] looks better.

Fit and summarize the best model:

Code
best_model <- Arima(data[,"dxy"], order = c(1,1,2), 
                seasonal = list(order = c(2,0,1), period = 12),
                xreg = data[, "egg"],
                optim.control = list(maxit = 10000))   
summary(best_model)
Series: data[, "dxy"] 
Regression with ARIMA(1,1,2)(2,0,1)[12] errors 

Coefficients:
          ar1     ma1     ma2    sar1     sar2    sma1    xreg
      -0.9684  1.2983  0.3595  -0.704  -0.1496  0.5654  0.0019
s.e.   0.0264  0.0678  0.0635   0.449   0.0765  0.4505  0.0125

sigma^2 = 0.0002711:  log likelihood = 645.57
AIC=-1275.13   AICc=-1274.5   BIC=-1247.32

Training set error measures:
                       ME       RMSE        MAE        MPE    MAPE      MASE
Training set 0.0009160148 0.01618699 0.01272872 0.01965128 0.28394 0.2172217
                    ACF1
Training set -0.01524275

Model Equation

\[ \begin{gathered} y_t=\beta x_t+n_t \\ \Phi_P\left(B^s\right) \varphi(B) \nabla_s^D \nabla^d n_t=\delta+\Theta_Q\left(B^s\right) \theta(B) w_t \end{gathered} \]

With the parameters from the model: \[ \begin{gathered} y_t=\beta x_t+n_t \\ y_t=0.0019 \text { Egg Price }+n_t \end{gathered} \]

With \(\operatorname{ARIMA}(1,1,2)(2,0,1)[12]\) errors: \[ \begin{gathered} \Phi_P\left(B^{12}\right) \varphi(B) \nabla_{12}^{D=0} \nabla^{d=1} n_t=\Theta_Q\left(B^{12}\right) \theta(B) w_t \\ (1-\Phi_1 B^{12} - \Phi_2 B^{24})\left(1-\phi_1 B\right)(1-B) n_t=\left(1+\Theta B^{12}\right)\left(1+\theta_1 B+\theta_2 B^2\right) w_t \\ (1+0.704 B^{12} +0.1496 B^{24})\left(1+0.9684 B\right)(1-B) n_t=\left(1+0.5654 B^{12}\right)\left(1+1.2983 B+0.3595 B^2\right) w_t \end{gathered} \]

Code
arima_egg <- auto.arima(ts_df5[,"egg"])
f_egg <- forecast(arima_egg, h = h)
f.5 <-   forecast(best_model, xreg = f_egg$mean, h = h)
autoplot(f.5) +
  labs(title = "12-Month Forecast of Log-Transformed U.S. Dollar",
       x = "Time", y = "Log-transformed USD") +
  theme_minimal()

I believe the forecast is very good. By adding the exogenous variable (egg price) to the SARIMA model, the SARIMAX model has effectively captured the pattern of USD.

ARIMAX: USD ~ House Price + International Visitors

Here, we will examine how the real estate and tourism market affect the USD.

Code
visitors_q <- visitors %>%
  mutate(quarter = floor_date(time, "quarter")) %>% 
  group_by(quarter) %>%
  summarise(count = mean(count, na.rm = TRUE))
df6 <- data.frame(Date = dxy_q$quarter, 
                  dxy = log(dxy_q$Close), 
                  house = log(house$index),
                  visitors = log(visitors_q$count))
plot_usd <- plot_ly(df6, x = ~Date, y = ~dxy, type = 'scatter', mode = 'lines', name = 'U.S. Dollar')
plot_house <- plot_ly(df6, x = ~Date, y = ~house, type = 'scatter', mode = 'lines', name = 'House Price') 
plot_visitors <- plot_ly(df6, x = ~Date, y = ~visitors, type = 'scatter', mode = 'lines', name = 'Visitors') 

subplot(plot_usd, plot_house, plot_visitors, nrows = 3, shareX = TRUE) %>%
  layout(title = "Trend of U.S. Dollar and Related Variables ", showlegend = FALSE,
    xaxis = list(title = 'Date'),
    yaxis = list(title = 'U.S. Dollar'),
    yaxis2 = list(title = 'House Price'),
    yaxis3 = list(title = "Num of Intl Visitors"))

Both the USD and house prices show an upward trend. Thus, we can fit an ARIMAX model using the USD as the dependent variable, house prices and international visitors as the exogenous variables.

Code
ts_df6 <- ts(df6, start = c(2005, 1), frequency = 4)
auto.arima(ts_df6[,"dxy"], xreg = ts_df6[,c("house", "visitors")])
Series: ts_df6[, "dxy"] 
Regression with ARIMA(0,1,1) errors 

Coefficients:
         ma1   house  visitors
      0.4995  0.3585   -0.0118
s.e.  0.0878  0.2114    0.0071

sigma^2 = 0.0007382:  log likelihood = 174.14
AIC=-340.28   AICc=-339.74   BIC=-330.8
Code
# Fit a linear model
lm_fit.6 <- lm(dxy ~ house + visitors, data = df6)
res.6 <- ts(residuals(lm_fit.6), start = c(2005, 1), frequency = 4)

# ACF and PACF plots of the residuals
ggtsdisplay(diff(res.6), main = "Differenced Residuals") # p=0:2, d=1, q=0:3

Code
# Manual search
output=ARIMA.c(p1=0,p2=2,q1=0,q2=3,data=res.6)
highlight_output(output)
Comparison of ARIMA Models
p d q AIC BIC AICc
0 1 0 -315.2008 -310.4620 -315.0430
0 1 1 -325.5822 -318.4739 -325.2622
0 1 2 -323.6752 -314.1974 -323.1346
0 1 3 -326.5282 -314.6809 -325.7062
1 1 0 -321.2992 -314.1908 -320.9792
1 1 1 -323.6205 -314.1428 -323.0800
1 1 2 -328.3456 -316.4983 -327.5237
1 1 3 -327.2125 -312.9958 -326.0458
2 1 0 -327.2298 -317.7520 -326.6893
2 1 1 -327.7637 -315.9165 -326.9418
2 1 2 -326.5323 -312.3157 -325.3657
2 1 3 -324.5698 -307.9837 -322.9924
Code
# Model Diagnostics
model_output <- capture.output(sarima(res.6, 0,1,1))

Code
start_line <- grep("Coefficients", model_output)
end_line <- length(model_output)
cat(model_output[start_line:end_line], sep = "\n")
Coefficients: 
         Estimate     SE t.value p.value
ma1        0.4289 0.0995  4.3111  0.0000
constant  -0.0001 0.0047 -0.0299  0.9763

sigma^2 estimated as 0.0008781503 on 77 degrees of freedom 
 
AIC = -4.121294  AICc = -4.119295  BIC = -4.031315 
 
Code
model_output <- capture.output(sarima(res.6, 1,1,2))

Code
start_line <- grep("Coefficients", model_output)
end_line <- length(model_output)
cat(model_output[start_line:end_line], sep = "\n")
Coefficients: 
         Estimate     SE t.value p.value
ar1        0.7855 0.0786  9.9877  0.0000
ma1       -0.5163 0.1041 -4.9618  0.0000
ma2       -0.4837 0.0977 -4.9491  0.0000
constant   0.0005 0.0008  0.5959  0.5531

sigma^2 estimated as 0.0007832046 on 75 degrees of freedom 
 
AIC = -4.156273  AICc = -4.149431  BIC = -4.006308 
 

The best model for the residuals of the linear model is ARIMA(0,1,1), since it has lower AIC and AICc values.

For both models:

The Residual Plot of the ARIMA model shows nearly consistent fluctuation around zero, suggesting that the residuals are nearly stationary with a constant mean and finite variance over time.

The Autocorrelation Function (ACF) of the residuals shows perfectly independence.

The Q-Q Plot indicates that the residuals follow a near-normal distribution, with minor deviations at the tails, which is typical in time series data.

The Ljung-Box Test p-values are all above the 0.05 significance level, implying that no autocorrelations are left in the residuals.

All coefficients are significant at the 5% significant level.

Code
# Define parameters
data <- ts_df6
n <- nrow(data)  # Total observations
h <- 4  # h: Forecast horizon
# k: Initial training set
# Calculate k as 1/3rd of the data, rounded down to the nearest multiple of 4
k <- floor(n / 3 / h) * h
num_iter <- (n - k) / h  # Number of rolling iterations

# Initialize matrices for RMSE
rmse1 <- matrix(NA, nrow = num_iter, ncol = h)  # RMSE for Model 1
rmse2 <- matrix(NA, nrow = num_iter, ncol = h)  # RMSE for Model 2

# Define rolling start time
st <- tsp(data)[1] + (k - 1) / h 

# Walk-Forward Validation Loop
for (i in 1:num_iter) {
  xtrain <- window(data, end = st + i - 1)
  xtest <- window(data, start = st + (i - 1) + 1/h, end = st + i)  # Test set for the next 4 months

  # Fit auto.arima() model
  model.1 <- Arima(xtrain[,"dxy"], order = c(0,1,1), 
                  xreg = xtrain[, c("house", "visitors")],
                  optim.control = list(maxit = 10000))   
  f.1 <-   forecast(model.1, xreg = xtest[, c("house", "visitors")], h = h)
  rmse1[i,] <-  (f.1$mean - xtest[,"dxy"])^2

  ###### Fit manual model
  model.2 <- Arima(xtrain[, "dxy"], order = c(1,1,2),
                  xreg = xtrain[, c("house", "visitors")],
                  optim.control = list(maxit = 10000))
  f.2 <- forecast(model.2, xreg = xtest[, c("house", "visitors")], h = h)
  rmse2[i,] <-  (f.2$mean - xtest[,"dxy"])^2
}   

# Compute RMSE across all iterations
rmse1_avg <- sqrt(colMeans(rmse1, na.rm = TRUE))
rmse2_avg <- sqrt(colMeans(rmse2, na.rm = TRUE))

# Create a DataFrame for better visualization
error_table <- data.frame(
    Horizon = 1:h,
    RMSE_Model1 = rmse1_avg,
    RMSE_Model2 = rmse2_avg
)

# **Improved RMSE Plot using ggplot2**
ggplot(error_table, aes(x = Horizon)) +
  geom_line(aes(y = RMSE_Model1, color = "Regression with ARIMA(0,1,1) errors"), size = 1) +
  geom_line(aes(y = RMSE_Model2, color = "Regression with ARIMA(1,1,2) errors"), size = 1) +
  labs(title = "RMSE Comparison for 4-Step Forecasts",
       x = "Forecast Horizon (Months Ahead)",
       y = "Root Mean Squared Error (RMSE)") +
  scale_color_manual(name = "Models", values = c("red", "blue")) +
  theme_minimal()

ARIMA(0,1,1) looks better.

Fit and summarize the best model:

Code
best_model <- Arima(data[,"dxy"], order = c(0,1,1), 
                xreg = data[, c("house", "visitors")],
                optim.control = list(maxit = 10000))   
summary(best_model)
Series: data[, "dxy"] 
Regression with ARIMA(0,1,1) errors 

Coefficients:
         ma1   house  visitors
      0.4995  0.3585   -0.0118
s.e.  0.0878  0.2114    0.0071

sigma^2 = 0.0007382:  log likelihood = 174.14
AIC=-340.28   AICc=-339.74   BIC=-330.8

Training set error measures:
                       ME       RMSE        MAE          MPE      MAPE
Training set 2.440354e-05 0.02648176 0.02082905 0.0001847146 0.4645378
                  MASE       ACF1
Training set 0.3713074 0.03777751

Model Equation

\[ \begin{gathered} y_t=\beta x_t+n_t \\ \varphi(B) \nabla^d n_t=\delta+ \theta(B) w_t \end{gathered} \]

With the parameters from the model: \[ \begin{gathered} y_t=\beta x_t+n_t \\ y_t=0.3585 \text { House Price } -0.0118 \text{International Visitors} +n_t \end{gathered} \]

With \(\operatorname{ARIMA}(0,1,1)\) errors: \[ \begin{gathered} \varphi(B) \nabla^{d=1} n_t= \theta(B) w_t \\ (1-B) n_t=\left(1+\theta_1 B\right) w_t \\ (1-B) n_t=\left(1+0.4995 B\right) w_t \end{gathered} \]

Code
arima_house <- auto.arima(ts_df6[,"house"])
f_house <- forecast(arima_house, h = h)
arima_visitors <- auto.arima(ts_df6[,"visitors"])
f_visitors <- forecast(arima_visitors, h = h)
xreg <- cbind(f_house$mean, f_visitors$mean)
colnames(xreg) <- c("house", "visitors")
f.6 <-   forecast(best_model, xreg = xreg, h = h)
autoplot(f.6) +
  labs(title = "4-Quarter Forecast of Log-Transformed U.S. Dollar",
       x = "Time", y = "Log-transformed USD") +
  theme_minimal()

I believe the forecast is very good. By adding the exogenous variables (house price and the number of international visitors) to the ARIMA model, the ARIMAX model has effectively captured the pattern of USD.